Diffusion Probabilistic Models (DPMs) have shown a powerful capacity of generating high-quality image samples. Recently, diffusion autoencoders (Diff-AE) have been proposed to explore DPMs for representation learning via autoencoding. Their key idea is to jointly train an encoder for discovering meaningful representations from images and a conditional DPM as the decoder for reconstructing images. Considering that training DPMs from scratch will take a long time and there have existed numerous pre-trained DPMs, we propose \textbf{P}re-trained \textbf{D}PM \textbf{A}uto\textbf{E}ncoding (\textbf{PDAE}), a general method to adapt existing pre-trained DPMs to the decoders for image reconstruction, with better training efficiency and performance than Diff-AE. Specifically, we find that the reason that pre-trained DPMs fail to reconstruct an image from its latent variables is due to the information loss of forward process, which causes a gap between their predicted posterior mean and the true one. From this perspective, the classifier-guided sampling method can be explained as computing an extra mean shift to fill the gap, reconstructing the lost class information in samples. These imply that the gap corresponds to the lost information of the image, and we can reconstruct the image by filling the gap. Drawing inspiration from this, we employ a trainable model to predict a mean shift according to encoded representation and train it to fill as much gap as possible, in this way, the encoder is forced to learn as much information as possible from images to help the filling. By reusing a part of network of pre-trained DPMs and redesigning the weighting scheme of diffusion loss, PDAE can learn meaningful representations from images efficiently. Extensive experiments demonstrate the effectiveness, efficiency and flexibility of PDAE.
translated by 谷歌翻译
Recently, RNN-Transducers have achieved remarkable results on various automatic speech recognition tasks. However, lattice-free sequence discriminative training methods, which obtain superior performance in hybrid modes, are rarely investigated in RNN-Transducers. In this work, we propose three lattice-free training objectives, namely lattice-free maximum mutual information, lattice-free segment-level minimum Bayes risk, and lattice-free minimum Bayes risk, which are used for the final posterior output of the phoneme-based neural transducer with a limited context dependency. Compared to criteria using N-best lists, lattice-free methods eliminate the decoding step for hypotheses generation during training, which leads to more efficient training. Experimental results show that lattice-free methods gain up to 6.5% relative improvement in word error rate compared to a sequence-level cross-entropy trained model. Compared to the N-best-list based minimum Bayes risk objectives, lattice-free methods gain 40% - 70% relative training time speedup with a small degradation in performance.
translated by 谷歌翻译
The Shapley value (SV) is adopted in various scenarios in machine learning (ML), including data valuation, agent valuation, and feature attribution, as it satisfies their fairness requirements. However, as exact SVs are infeasible to compute in practice, SV estimates are approximated instead. This approximation step raises an important question: do the SV estimates preserve the fairness guarantees of exact SVs? We observe that the fairness guarantees of exact SVs are too restrictive for SV estimates. Thus, we generalise Shapley fairness to probably approximate Shapley fairness and propose fidelity score, a metric to measure the variation of SV estimates, that determines how probable the fairness guarantees hold. Our last theoretical contribution is a novel greedy active estimation (GAE) algorithm that will maximise the lowest fidelity score and achieve a better fairness guarantee than the de facto Monte-Carlo estimation. We empirically verify GAE outperforms several existing methods in guaranteeing fairness while remaining competitive in estimation accuracy in various ML scenarios using real-world datasets.
translated by 谷歌翻译
语言模型既展示了定量的改进,又展示了新的定性功能,随着规模的增加。尽管它们具有潜在的变革性影响,但这些新能力的特征却很差。为了为未来的研究提供信息,为破坏性的新模型能力做准备,并改善社会有害的效果,至关重要的是,我们必须了解目前和近乎未来的能力和语言模型的局限性。为了应对这一挑战,我们介绍了超越模仿游戏基准(Big Bench)。 Big Bench目前由204个任务组成,由132家机构的442位作者贡献。任务主题是多样的,从语言学,儿童发展,数学,常识性推理,生物学,物理学,社会偏见,软件开发等等。 Big-Bench专注于被认为超出当前语言模型的功能的任务。我们评估了OpenAI的GPT型号,Google内部密集变压器体系结构和大型基础上的开关稀疏变压器的行为,跨越了数百万到数十亿个参数。此外,一个人类专家评估者团队执行了所有任务,以提供强大的基准。研究结果包括:模型性能和校准都随规模改善,但绝对的术语(以及与评估者的性能相比);在模型类中的性能非常相似,尽管带有稀疏性。逐渐和预测的任务通常涉及大量知识或记忆成分,而在临界规模上表现出“突破性”行为的任务通常涉及多个步骤或组成部分或脆性指标;社交偏见通常会随着含糊不清的环境而随着规模而增加,但这可以通过提示来改善。
translated by 谷歌翻译
在本文中,我们通过深神经网络倾斜地研究了(2 + 1)-dimensional KP-I等式和旋转非线性SCHR \“odinger(Spin-NLS)方程的数据驱动Rational孤子的前向问题。此外,通过深度学习研究了(2 + 1)-Dimensional KP-I等式和Spin-NLS方程的逆问题。数据驱动前向前逆问题的主要思想是使用深神经网络激活函数通过优化与所考虑的非线性波动方程相关的所选损耗函数来近似考虑(2 + 1) - 二维非线性波方程的解。
translated by 谷歌翻译
我们介绍了一种深度神经网络学习方案,以了解Soliton演化方程的B \“Acklund变换(BTS)以及基于已知BTS的数据驱动孤子方程发现增强的深度学习方案。第一个方案利用一些解决方案(或Soliton方程)学习Sine-Gordon方程的数据驱动BT的信息,以及在散焦(聚焦)MKDV方程和KDV方程之间的复杂和实际Miura变换,以及通过数据驱动的MKDV方程发现Miura变换。第二个深度学习方案使用显式/隐式BTS生成高阶孤子,以训练MKDV和Sine-Gordon方程的数据驱动的发现,其中高阶解决方案信息对于增强型更强大倾斜孤子方程具有更高的准确性。
translated by 谷歌翻译
最近,使用批评者分配表示截断的分量批评者(TQC),显示在Mujoco连续控制基准套件的所有环境中提供最先进的渐近培训表现。此外,使用高更新到数据比和目标随机化的随机集合双Q学习(REDQ)达到了具有基于最先进的模型的方法竞争的高样本效率。在本文中,我们提出了一种新的无模型算法,具有集合(AQE)的激进Q学习,这提高了REDQ的样品效率性能和TQC的渐近性能,从而提供了整体最先进的性能在培训的所有阶段。此外,AQE非常简单,要求批评者的分布表示也不是目标随机化。
translated by 谷歌翻译
已开发了网络车辆中动态地图融合的技术,以扩大感应范围并提高单个车辆的感应精度。本文提出了一个基于联合学习(FL)的动态地图融合框架,以达到高地图质量,尽管视野中的对象数量未知(FOV),各种感应和模型不确定性以及缺少用于在线学习的数据标签。这项工作的新颖性是三重的:(1)开发一个三阶段的融合方案,以有效地预测对象的数量并将多个局部地图融合到富达得分; (2)开发一种通过汇总模型参数分布的FL算法,该算法通过微型模型(即表示特征提取的表示网络)进行了; (3)开发一种知识蒸馏方法,以在数据标签不可用时生成FL培训标签。所提出的框架是在汽车学习(CARLA)模拟平台中实施的。提供了广泛的实验结果,以验证开发的MAP融合和FL方案的出色性能和鲁棒性。
translated by 谷歌翻译
图表卷积网络(GCNS)已成为图形学习的最先进的深度学习模型。然而,在大型图形数据集中训练和推理GCN仍然令人惊奇地挑战,将其应用于大型真实图表并阻碍更深层和更复杂的GCN图形的探索。这是因为随着图形尺寸的增长,节点特征的纯粹数量和大邻接矩阵可以很容易地爆炸所需的内存和数据移动。为了解决上述挑战,我们探讨了在缩小GCN图表时绘制彩票票证的可能性,即,基本上缩小邻接矩阵的子图能够实现与完整图表相当的准确性。具体而言,我们首次发现在稀释GCN图的早期阶段的图表早期(GEB)票的存在,并提出了一种简单但有效的探测器,以自动识别这种GEB门票的出现。此外,我们倡导图形模型共同优化,开发了一个通用的GCN早期鸟类训练框架,称为GCN培训的效率(1)在GCN图形和模型之间绘制联合早期鸟类,( 2)启用GCN图形和模型的同时稀疏。关于各种GCN模型和数据集的实验一致地验证了我们的GEB寻找和GEBET的有效性,例如,我们的GEBT实现高达80.2%〜85.6%和84.6%〜87.5%的GCN培训和推理成本,同时提供了可比甚至与最先进的方法相比,更好的准确性。我们的源代码和补充附录可用于https://github.com/rice-eic/early-bird-gcn。
translated by 谷歌翻译
Code generation models have achieved impressive performance. However, they tend to be brittle as slight edits to a prompt could lead to very different generations; these robustness properties, critical for user experience when deployed in real-life applications, are not well understood. Most existing works on robustness in text or code tasks have focused on classification, while robustness in generation tasks is an uncharted area and to date there is no comprehensive benchmark for robustness in code generation. In this paper, we propose ReCode, a comprehensive robustness evaluation benchmark for code generation models. We customize over 30 transformations specifically for code on docstrings, function and variable names, code syntax, and code format. They are carefully designed to be natural in real-life coding practice, preserve the original semantic meaning, and thus provide multifaceted assessments of a model's robustness performance. With human annotators, we verified that over 90% of the perturbed prompts do not alter the semantic meaning of the original prompt. In addition, we define robustness metrics for code generation models considering the worst-case behavior under each type of perturbation, taking advantage of the fact that executing the generated code can serve as objective evaluation. We demonstrate ReCode on SOTA models using HumanEval, MBPP, as well as function completion tasks derived from them. Interesting observations include: better robustness for CodeGen over InCoder and GPT-J; models are most sensitive to syntax perturbations; more challenging robustness evaluation on MBPP over HumanEval.
translated by 谷歌翻译